EDA for Affective AI datasets¶
Imports¶
KDEF Dataset¶
The dataset files are named using a specific convention that encodes information about the subject, expression, and angle of the photograph. The naming convention is as follows:
Example: AF01ANFL.JPG
- Letter 1: Session
A= series oneB= series two
- Letter 2: Gender
F= femaleM= male
- Letter 3 & 4: Identity number
01-35
- Letter 5 & 6: Expression
AF= afraidAN= angryDI= disgustedHA= happyNE= neutralSA= sadSU= surprised
- Letter 7 & 8: Angle
FL= full left profileHL= half left profileS= straightHR= half right profileFR= full right profile
- Extension: Picture format
JPG= jpeg (Joint Photographic Experts Group)
Dataset Statistics¶
- Participants: 70 (35 males and 35 females).
- Age: m 25 years, ranging from 20 to 30 years.
- Expressions: 7 (neutral, happy, angry, afraid, disgusted, sad, surprised).
- Angles: 5 (full left profile, half left profile, straight, half right profile, full right profile).
- Sessions: 2.
- Number of pictures: 4900.
- Size: 562 * 762 pixels.
- Resolution: 72*72 dpi.
- Colors: 16.7 million (32 bit).
- Size inflated: 1.6 Mb.
- Size compressed: approximately 122 kb (ranging from 85 to 158 kb).
- File format: JPEG.
- Compression quality: 94 %.
Check if files are valid:¶
For each directory (person) check:
If every person has the same number of images (35)
For each image check:
- If it's JPG
- If it starts with the directory name
- If the emotion code is valid
- If the angle code is valid
- If every image is of size 562x762
- If is RGB
Count number of images per emotion and angle, they should be 7 and 5 respectively
Failures:
2 instances failed on wrong naming convention: "AF31V.JPG", "AM31H.JPG" (fix: renamed)
2 instance failed on image size: "AM05ANFR.JPG", "BM10ANFR.JPG" - size 970 x 633 (fix: manually cropped)
10 instances failed on image being mostly black (mean pixel value < 10):
- "AF01SUFR.JPG", "AF10AFFR.JPG", "AF11NEHL.JPG", "AF20DIHL.JPG", "AM25DIFL.JPG", "AM34DIFR.JPG", "BF13NEHR.JPG", "BM21DIFL.JPG", "BM22DIHL.JPG", "BM24DIFL.JPG" (fix: TODO)